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In  this  paper  we  describe  our  research  in  the  development  of  a  four-layered  architecture  for 
Heterogeneous  Distributed  Database  Management  Systems  (HDDBMS).  The  architecture  includes 
the  locaJ  schema,  local  object  schema,  global  schema,  and  global  view  schema.  This  architecture 
was  developed  to  support  the  propagation  of  local  database  semantics  (e.g.,  integrity  constraints, 
context)  to  the  global  schema  and  global  view.  Constraints  propagated  to  the  global  level  can  be 
used  to  derive  new  constraints  that  could  not  have  been  recognized  by  any  of  the  local  components. 
These  constraints  are  important  in  significantly  reducing  query  processing  costs  in  the  HDDBMS 
environment  by  permitting  incorporation  of  techniques  similar  to  semantic  query  optimization  in 
the  single  database  environment  [CFM84,HZ80,Kin81,SSS91].  These  techniques  are  used  on  the 
global  query  to  identify  candidate  databases  and  reduce  the  number  of  required  local  databases. 

So  far  local,  global  and  view  layers  are  considered  to  be  defined  by  passive  objects  (i.e.,  without 
methods).  As  a  result,  changes  to  the  semantics  at  the  local  schema  have  to  be  manually  propagated 
to  the  global  level  in  order  to  maintain  a  set  of  globally  consistent  integrity  constraints.  We  are 
currently  investigating  the  use  of  active  objects  as  components  of  our  four-layer  architecture  capable 
of  triggering  changes  in  the  semantics  to  maintain  a  consistent  set  of  global  integrity  constraints. 

In  Section  1,  we  summarize  the  key  components  of  the  four-layer  architecture  and  describe  the 
derivation  of  global  integrity  constraints.  In  Section  2  we  describe  the  role  of  semantic  query  pro- 
cessing at  the  global  level  and  compares  this  with  existing  semantic  query  optimization  techniques. 
Finally,  in  Section  3  we  present  our  vision  for  the  use  of  active  objects  to  maintain  the  consistency 
of  the  mapping  knowledge  and  to  maintain  global  integrity  constraint  consistency. 

1     Integration  Model 

A  methodology  for  designing  a  HDDBMS  was  proposed  in  [RPR89,Red90];  this  methodology  used  a 
four-layered  schema  architecture:  local  schemata,  local  object  schemata,  global  schema,  and  global 
view  schema  as  shown  in  Figure  1.  Each  layer  presents  am  integrated  view  of  the  concepts  that 
characterize  the  layer  below. 
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Figure  1:  Schema  Architecture  of  a  four-layered  HDDBMS 

1.1      Local  Schema 

The  bottom  layer  consists  of  a  set  of  local  database  schemata.  Each  local  database  schema  is 
denoted  by  Di,  where  'i'  denotes  the  identification  of  the  database.  These  schemata  provide  the 
description  of  the  data  stored  in  their  respective  data  models.  The  stored  data  can  be  retrieved 
only  by  using  their  respective  query  languages. 


1.2      Local  Object  Schema 

One  local  object  schema  is  constructed  for  each  local  schema.  For  a  given  Local  Schema  D,,  the 
construction  of  its  corresponding  Object  Schema  ODi  involves  the  identification  of  the  set  5,  which 
gives  the  distinct  object  types  in  the  schema  £>,,  the  semantic  meaning  of  the  data  associated  with 
every  instance  of  the  object  in  5,,  and  the  constraints  associated  with  these  objects.  The  knowledge 
that  maps  objects  in  5,  to  their  corresponding  data  structures  in  Di  is  also  placed  at  this  layer. 

An  object  in  5,  is  any  distinguishable  entity  whose  description  is  available  in  the  Local  Schema 
D,.  A  database  object  is  denoted  by  0/  where  /  is  a  unique  object  identifier:  /  consists  of  a  pair  of 
indices,  say  (i.j),  where  the  first  index  i  specifies  the  schema  identification  and  the  second  index 
]  provides  the  object  identification  within  the  schema.  Each  object  possesses  a  set  of  properties. 
A.  property  is  denoted  by  Pk  where  A:  is  a  unique  property  identifier;  k  is  expressed  as  a  pair  I. pi 
where  /  is  its  object  identifier  and  pi  is  the  property  identifier  with  respect  to  the  object  0/.  The 
Property  Set  associated  with  the  object  0/  is  denoted  by  PSo,-  The  object  0/  is  characterized  by 
its  properties.  This  characterization  is  denoted  by:  0;  <=>  PSo,.  The  key  property  of  the  object 
0/  is  denoted  by  Kt  e  PSo,- 

A  property  can  itself  be  characterized  by  a  set  of  meta- properties.  Meta-properties  are  the 
parameters  needed  to  provide  a  complete  semantic  meaiiing  to  the  symbols  associated  with  the 
property.  For  example,  PERIODICITY-OF-  PAY  and  CURRENCY  represent  the  meta-properties  of 
the  property  T-SAL. 


Let  M^''  denote  the  set  of  meta-properties  associated  with  the  property  F^  and  l^''*!  denote 
the  number  of  meta-properties  associated  with  Pk.  For  each  meta-property  there  is  a  set  of  legal 
meta-values.  DOLLAR,  RUPEE,  and  POUND  are  some  of  the  meta-values  for  the  meta-property 
CURRENCY;  similarly  WEEKLY,  MONTHLY,  and  YEARLY  are  some  of  the  meta-values  for  the 
meta-property  PERIODICITY-OF-PAY.  Further,  if  V^  is  the  meta- value  of  the  property  Pk  associated 
with  the  meta-property  M'^,  we  define  Ml(Pk)  =  V^'.  These  meta  vaJues  are  used  to  recognize 
semantic  incompatibilities  among  the  siniilar  concepts  in  different  layers. 

1.3     Global  Schema 

The  global  schema  is  derived  from  the  component  local  schemata.  Objects  in  the  component 
scheraas  are  first  pooled  together  and  then  decomposed  into  object  equivalence  classes  comparing 
their  real  world  states  [NEL86].  Two  objects  belonging  to  an  equivalence  class  means  they  must 
have  the  same  real  world  states.  Each  object  equivalence  class  gives  one  global  object  type.  Further 
each  local  object  in  an  object  equivalence  class  constitutes  a  component  of  the  global  object  derived 
from  the  object  equivalence  class.  If  Ol  is  a  global  object  and  Ot  is  its  component,  then  we  denote 
this  relation  as  0;  6  Ol. 

To  compute  the  properties  of  a  global  object,  we  compute  the  union  of  the  properties  of  ail 
its  components  and  decompose  this  union  into  a  number  of  property  equivalence  classes  where 
each  property  equivalence  class  provides  one  property  for  the  global  object.  All  properties  in  one 
property  equivalence  class  are  called  components  of  the  global  property  derived  from  the  particular 
property  equivalence  class.  If  Pl  is  a  global  property  and  Pi  is  its  component,  then  we  denote 
this  relation  as  P/  6  Pl-  The  semantic  meaning  for  a  global  property  is  fixed  by  defining  all  the 
meta-values  to  the  respective  meta-properties.  Two  transformation  maps  T/  l  a-nd  ^L,/  ^'^  defined 
which  mcike  Pi  semanticaJly  compatible  to  Pl  and  Pl  semantically  compatible  to  Pi  respectively. 

Two  properties  Pi, and  Pl  are  said  to  be  meta- value  compatible  with  respect  to  the  meta- 
property  M'  if  and  only  if  Af'(P/)  =  M'(Pl),  that  is,  if  and  only  if  V/  =  V^,  and  this  compatibility 
is  denoted  by: 

If  T-SAL  is  the  monthly  salary  paid  in  rupees  and  FAC-P  is  the  annual  salary  paid  in  dollars,  these 
two  properties  are  not  meta- value  compatible  with  respect  to  PERIODICITY-OF-PAY  or  CUR- 
RENCY. 
•  Transformation  Map 

If  a  property  P;  is  not  meta-value  compatible  with  Pl  with  respect  to  the  meta-property  .\P . 
then  it  is  possible  to  define  a  transformation  map  tp^'p     which  makes  Pi  meta-value  compatible 

with  Pl  with  respect  to  the  meta-property  M-'.  Note  that  t'p^'p     may  be  a  look-up  table. 

In  the  above  example,  F-PAY  is  not  compatible  with  T-SAL  with  respect  to  the  meta-property  CUR- 
RENCY.  The  meta-value  compatibility  can  be  obtained  with  the  transformation  map  tj_sAL.F-PAY 


As  such 

fCURRENCY       ,rp     CAT\      CURRENCY       i-  r,  av 
h-SAL.F-PAY\^  -^^^1  I- -PAY 

Here     t^^sAL.F-PAYiT-SAL)  is  ^  times  T-SAL,  assuming  $1  =  Rupees  24. 

•  Composite  Transformation  Map 

Two  properties  Pi  and  Pl  in  [Pk]  are  defined  to  be  semanticaJly  compatible  with  each  other 
if  and  only  if  they  have  meta-value  compatibility  with  respect  to  all  meta- properties  pertinent 
to  these  properties.  This  is  symbolically  denoted  by  P;  ~  Pl.  Further,  if  P;  and  Pl  are  not 
semanticzdly  compatible,  then  the  composite  transformation  map  Tp^p  can  be  defined  which 
makes  Pi  semantically  compatible  with  PL- 
Suppose  'p  p.  >^p  p    7  •  •  •  i^p  p     *re  the  transformation  maps  which  make  Pi  meta-value  com- 

patible  with  Pl  with  respect  to  the  meta  properties  M^,M^,...  ,A/'^  ''I  respectively.  The  trans- 
formation map  can  be  defined  as  follows: 

Tr„P^{Pi)    =    ('}'„Pl°'p„Pl°---°'^Pl^(^'^ 

=    <K.Pl('p..Pl(---(Cl(^')))) 
~    Pl 


Note  that  if  Pi  and  Pl  are  already  compatible  with  respect  to  a  particular  meta-property,  then 
the  corresponding  transformation  map  can  be  ignored  in  the  construction  of  the  composite  trans- 
formation map.  By  using  composite  transformation  maps,  homogeneity  among  the  component 
properties  can  be  achieved,  thereby  resolving  semantic  incompatibilities  during  the  stages  of  query 
decomposition  and  data  integration. 

This  completes  the  integration  of  objects  having  the  same  real  world  states. 

If  the  real  world  state  of  an  object  Ol  is  contained  in  that  of  another  object  Ok,  then  Ok  is 
said  to  have  super  class  relationship  with  Ol-  If  there  is  a  superclass  subclass  relationship  among 
global  object  types,  then  the  subclass  object  inherits  all  the  properties  of  the  super  class  object. 
.All  these  global  objects  form  the  third  layer  in  the  architecture. 

In  the  next  subsubsection  we  discuss  a  method  for  deriving  the  constraints  associated  with  the 
global  schema  from  the  constraints  avadlable  at  the  component  local  object  schemata. 

1.3.1      Constraints  at  Global  Schema  Level 

The  semantic  knowledge  of  the  global  schema  and  the  mapping  knowledge  between  the  global 
schema  and  the  component  local  object  schemata  are  used  to  transform  the  constraints  on  the 
local  object  schemata  into  a  set  of  global  constraints.  A  detailed  procedure  for  this  transformation 
is  presented  in  [RPG92]. 

Certain  constraints  are  relevant  only  at  the  global  schema  level  but  not  at  any  of  the  component 
local  object  schema  For  example,  consider  the  two  relational  schemata  shown  in  Figure  2.    The 
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Figure  2:  Example  Derivation  of  Global  Integrity  Constraints 


constraints  shown  at  the  local  level  are  propagated  to  the  global  level  [RPG92]  and  used  to  derive 
new  global  constraints.  For  example,  the  constraint  FAC-RANK  =  'Professor'  —  FAC-OFFICE- 
TYPE  =  'A'  can  only  be  derived  at  the  global  level. 

The  above  discussion  shows  the  meaningful  interaction  among  different  layers  depends  on  the 
semantics  of  the  similar  concepts  in  different  layers  and  in  turn  depends  on  the  correctness  of 
the  composite  transformation  maps  defined  between  these  layers.  Our  previous  work  suggest  that 
these  composite  transformation  maps  need  to  be  redefined  manually  whenever  the  semantics  of  the 
concepts  present  in  these  layers  change. 

1.4     View  Object  Schema 

Some  of  the  objects  in  the  third  layer  may  possess  disjoint  or  overlapping  domains.  The  integration 
of  these  objects  may  be  required  for  global  users,  creating  a  need  for  generalizing  such  objects  to 
produce  global  views.  Each  of  the  global  objects  that  is  generalized  to  produce  the  global  view  is 
called  the  component  of  the  view  object. 

The  properties  of  the  global  view  object  Jire  derived  by  first  computing  the  union  of  the  prop- 
erties of  the  component  objects.  This  union  is  decomposed  into  property  equivalence  classes;  from 
these  we  create  a  subset  retaining  a  property  equivalence  class  only  if  it  contains  one  property  from 
each  and  every  component  of  the  view  object.  Each  such  property  equivalence  class  provides  one 
property  for  the  global  object. 

The  following  section  outlines  the  potential  benefits  of  the  global  integrity  constraints  and  the 
need  to  maintain  their  consistency. 


2      Using  GICs  in  Semantic  Query  Processing 

In  [RSG92]  we  describe  algorithm  for  using  GICs  in  semantic  query  processing.  Significant  sav- 
ings can  occur  using  semantic  query  processing  for  global  queries.  Some  of  the  key  optimization 
techniques  introduced  in  our  GlC-based  query  processing  strategy  are: 

•  Null  Queries:  Rejection  of  null  global  queries  at  the  initial  stage  would  reduce  the  average 
query  response  time.  Null  queries  are  typically  entered  by  users  who  do  not  possess  adequate 
knowledge  about  explicit  and  implicit  relations  among  the  objects/entities.  This  is  especially 
true  in  a  HDDBMS  environment  where  the  global  schema  is  generally  large  and  difficult  for 
the  user  to  understand  completely. 

•  Deduction  of  Query  Results:  SQP  facilitate  deduction  of  values  of  target  attributes  using 
available  semantic  knowledge  and  query  qualification.  The  deduction  of  all  target  attributes 
may  result  in  answering  complete  queries.  Even  when  all  the  target  properties  may  not  be 
deducible  using  semantic  knowledge,  the  deduction  of  a  subset  of  the  target  properties  may 
eliminates  the  need  for  the  generation  of  one  or  more  subqueries. 

•  Avoidance  of  Large  Search  Space:  Because  the  search  space  comprises  of  the  union  of 
all  the  component  databases,  the  time  to  process  global  queries  may  exceed  an  acceptable 
range.  The  need  for  an  exhaustive  search  of  aJl  the  component  databases  can  be  avoided 
by  implementing  a  sophisticated  query  optimization  strategy.  SQP  techniques  can  reduce 
the  size  of  the  relevant  search  space  by  selecting  an  appropriate  minimal  set  of  candidate 
databases. 

•  Optimization  of  Subqueries:  Semantic  query  processing  does  not  terminate  at  the  global 
schema  level  after  optimizing  the  global  query.  Subqueries  of  the  global  query  need  to  be 
optimized  further  by  using  additional  Semantic  Query  Optimization  techniques. 

•  Generation  of  Missing  Data:  One  of  the  problems  faced  during  the  integration  of  par- 
tial results  is  that  of  missing  data.  This  problem  arises  because  of  incompleteness  of  the 
component  databases.  This  problem  may  be  resolved  using  GICs. 

•  Resolution  of  Data  Inconsistencies:  Data  inconsistency  is  another  problem  which  de- 
mands solution  during  the  stage  of  integration  of  partial  results  obtained  by  processing  sub- 
queries  agadnst  their  respective  databases.  This  problem  axises  because  of  uncontrolled  re- 
dundancy inherent  in  heterogeneous  environments.  Semantic  knowledge  can  be  utilized  to 
overcome  this  problem. 

This  semantic  query  processing  concept  requires  a  set  of  consistent  global  integrity  constraints. 
However,  changes  in  local  database  semantics  is  not  easily  reflected  in  the  structure  or  semantic 
knowledge  at  the  global  level.  In  the  following  section  we  provide  some  insight  into  how  a  more 
active  architecture  may  be  able  to  provide  a  consistent  global  representation. 
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Figure  3:  Architecture  of  the  proposed  HDDBMS 


3     Maintaining  Consistent  Global  Integrity  Constraints 

Query  processing  in  a  HDDMS  can  be  improved  using  GICs.  This  is  contingent  upon  the  availability 
of  a  consistent  set  of  GICs.  Since  these  constraints  are  derived  from  the  local  constraints,  any  change 
in  the  semantics  of  the  local  schema  impacts  the  set  of  local  integrity  constraints  associated  with 
that  schema.  The  corresponding  change  must  be  reflected  in  the  GICs.  Currently,  objects  in  the 
local  object  schema  and  in  the  global  schema  are  passive,  in  the  sense  that  they  contain  no  methods 
and  must  be  redefined  whenever  there  is  any  chainge  in  the  local  schema.  Our  plan  is  to  make  these 
objects  active,  in  the  sense  that  whenever  some  change  occurs  in  the  local  schema,  the  objects  in 
the  top  three  layers  evolve  to  cope  with  the  change  at  the  local  schema.  For  example,  consider 
the  system  architecture  shown  in  Figure  3.  In  the  "passive  world",  a  local  database  administrator 
would  contact  the  global  administrator  to  register  a  change  in  the  local  schema,  and  the  global 
administrator  would  change  the  local  object  schema  and  all  other  layers  and  distribute  new  copies 
of  the  global  schema  to  the  user  sites.  In  the  "active  world",  the  changes  in  the  local  schema  would 
be  reflected  in  the  local  object  schema  and  inconsistencies  would  be  identified  and,  when  possible, 
a  consistent  global  schema  could  be  automatically  produced. 

The  schema  evc^ution  process  has  been  studied  in  the  context  of  object  oriented  databases  [BCG*90). 
This  work  mainly  concentrates  on  the  schema  evolution  process  (i)  changes  to  the  contents  of  an 
object  class  (e.g.,  changes  to  an  instance  variable  or  method);  (ii)  changes  to  relations  among  the  ob- 
ject classes;  (iii)  addition  or  removal  of  object  classes  from  the  schema.  Because  incremental  growth 
is  one  of  the  desired  features  of  a  HDDBMS,  such  schema  evolutions  is  applicable  to  HDDBMS. 
Automatic  schema  evolution  makes  it  easier  to  add  a  new  database  to  an  existing  HDDBMS.  How- 
ever, existing  evolution  mechanisms  are  not  adequate  for  our  requirement.  Whenever  auiy  change 
occurs  in  the  semantics  of  an  attribute  in  the  local  schema  that  the  change  must  be  reflected  in 
all  transformation  maps  pertinent  to  that  attribute  in  diflTerent  layers;  further  the  corresponding 
LICs  and  GICs  must  be  modified.  We  are  currently  investigating  methods  that  make  objects  in 


different  layers  active,  so  that  they  can  be  used  in  our  four  layered  architecture  to  generate  current 
composite  transformation  maps,  and  to  generate  consistent  global  integrity  constraints. 

Some  examples  of  the  uses  for  active  objects  include  the  identification  of  invalid  instances  of  both 
transformation  mappings  and  global  constraints.  The  layers  of  the  architecture,  tramsformation 
maps  and  global  constraints  can  be  provided  with  methods  or  message  passing  capabilities  that 
allow  for  notification  of  changes  in  these  object  states.  For  example,  assume  that  the  constraint 
in  Figure  2  on  the  local  FACULTY  relation  is  changed  so  that  only  those  faculty  members  whose 
salary  is  more  than  150K  Dollars  per  year  will  get  office-type  'A'.  This  situation  requires  that  one 
of  the  previously  generated  GICs  be  made  invalid  and  a  new  GIC  must  be  generated  in  its  place. 
We  proposed  to  generate  GICs  and  define  demons  to  monitor  the  changes  in  its  component  LICs. 
Whenever  there  is  a  change  in  one  of  the  components  these  demons  invoke  a  method  to  reconstruct 
the  GIC  suitable  to  the  local  changes.  If  the  semantics  of  F-PAY  are  changed  so  that  it  gives 
annual  salaries  in  Rupees,  then  the  corresponding  transformation  map  is  required  to  be  changed. 
This  method  for  constructing  the  composite  transformation  map  may  access  global  ontologies  or 
conversion  routine  libraries.  If  such  automatic  construction  is  not  possible,  then  we  would  want 
the  system  designer  to  be  automatically  notified  of  the  impact  of  these  changes. 

The  four-layered  architecture  provides  a  well-defined  set  of  integration  stages.  We  believe  that 
enhancing  this  network  with  active  capabilities  will  allow  for  automatic  recognition  and  resolution 
of  conflicts  that  resolve  from  changes  in  the  semantics  at  the  local  schema. 

References 


[BCG*90)  J.  Banerjee,  H.  T.  Chou,  J.  F.  Garza,  W.  Kim,  D.  Woelk,  N.  Ballou,  and  H.  J.  Kim. 
Data  model  issues  for  object-oriented  applications.  In  S.  B.  Zdonik  and  D.  Maier,  edi- 
tors. Readings  in  Object-Oriented  Database  Systems,  pages  161-213,  Morgan  Kaufmann 
Publishers,  Inc,  1990. 

[CFM84]  U.  Chakravarthy,  D.  Fishman,  and  J.  Minker.  Semantic  query  optimization  in  expert 
systems  and  database  systems.  In  Proceedings  of  the  First  Intl.  Conference  on  Expert 
Database  Systems,  pages  326-340,  1984. 

[HZ80)  M.  Hammer  and  S.  Zdonik.  Knowledge-based  query  processing.  In  Proceedings  6th 
VLDB,  pages  137-146,  1980. 

[KinSl]  J.  King.  QUIST  :  A  system  for  semantic  query  optimization  in  relational  databases.  In 
Proceedings  7th  VLDB,  pages  510-517,  1981. 

[NEL86]  S.  B.  Navathe,  R.  Elmasri,  and  J.  Larson.  Integrating  user  views  in  database  design. 
Computer,  19,  1986. 

[Red90)       M.  P.  Reddy.     Heterogeneous  Distributed  Database  Management  Systems:    Modeling 
and  Managing  Heterogeneous  Data.      PhD  thesis.   School  of  Mathematics  &   Com 
puter/Information  Science,  University  of  Hyderabad,  India,  1990. 


[RPG92]  M.  P.  Reddy,  B.  E.  Prasad,  and  A.  Gupta.  Formulation  gTobal  integrity  constraints 
during  derivation  of  global  schema.  In  Submittion  to  Knowledge  and  Data  Engineering, 
1992.  '■— ^ 

[RPR89]  M.  P.  Reddy,  B.  E.  Prasad,  and  P.  G.  Reddy.  A  methodology  for  resolving  semantic 
incompatibilities  and  data  inconsistencies  in  integrating  heterogeneous  databases.  In  In 
Proc.  Int.  Conference  on  Management  of  Data,  Hyderabad,  India,  1989.  — ' 


[RSG92] 


[SSS91] 


M.  P.  Reddy,  M.  Siegel,  and  A.  Gupta,    ^^mantic  query  processing  in  hddbms.    In 
submission  to  VLDB  Journal,  1992.  V 


M.  Siegel,  S.  Salveter,  and  E.  Sciore.  _  Automatic  rule  derivation  for  semantic  query 
optimization.  Accepted  for  publicatior^  to  ^Transactions  on  Database  Systems,  1991. 


/ 


MIT  LIBRARIES 


3  9080  00932  7500 


197    '9 


Date  Due 


Lib-26-67 


